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A  convenient  and  useful  method  for  showing  weak  convergence,  to  a 
diffusion,  of  the  interpolated  solutions  of  a  (not  necessarily 
Markovian)  sequence  of  stochastic  difference  equations  is  developed. 

The  technique  involves  the  use  of  averaging  methods  to  show  that  the 
weak  limit  satisfies  the  martingale  problem  of  Strook  and  Varadhan  which 
is  associated  with  the  diffusion.  A  truncation  method  is  developed  so 


ABSTRACT 


that  it  is  only  necessary  to  work  with  the  parts  of  the  process  before 
first  escape  from  an  arbitrary  but  bounded  domain.  The  assumptions 
cover  a  wide  variety  of  applications  in  systems  theory,  mathematical  biology  and 
elsewhere  but  the  method  of  proof  is  adaptable  to  other  special  cases  where  our 
particular  assumptions  might  not  hold.  Two  applications  are  given  in  order  to 
illustrate  the  relative  ease  of  use  of  the  method.  The  driving  noise 
process  in  the  difference  equations  can  depend  on  the  solution 
process  of  the  difference  equation,  and  one  application  where  this 
is  useful  is  given  (a  rate  of  convergence  problem  for  simple 


stochastic  approximations  with  sequentially  averaged  observations). 
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1.  Introduction 


Avalland/or 

•paclal 


_L  The  paper  develops  a  general  method  for  proving  weak  convergence 
to  a  diffusion  process  of  the  sequence  of  appropriately  scaled  and 
interpolated  solutions  to  the  (not  necessarily  Markovian)  equation 


XQ  given. 

£ 

The  {5^}  are  a  sequence  of  random  variables  whose  distributions 
might  depend  on  the  {X^}.  T^e  convenient  to  use  and  has 

wide  applicability.  In  order  to  illustrate  the  use  of  the  method, 
applications  to  the  rate  of  convergence  for  a  general  form  of 
stochastic  approximation  and  to  a  problem  of  Guess  and  Gillespie  [1] 
are  treated.  For  the  latter  problem,  the  treatment  in  [1]  required 
an  explicit  construction  of  the  solution  -  essentially  limiting  the 
treatment  to  the  scalar  case.  There  is  no  such  restriction  here. 


Define  X  (*)  by  X  (t)  *  Xn  on  [ne,n£+e).  The  basic  idea 
£ 

is  to  prove  that  {X  (•))  converges  weakly  to  the  solution  of  the 
martingale  problem  [2]  connected  with  the  diffusion  process.  In  recent  years 
many  nice  results  for  dealing  with  weak  convergence  of  a  sequence 
of  non-Markov  continuous  parameter  processes  to  a  Markov  process 
have  been  developed  [3] -[5],  but  the  discrete  parameter  case  is  not 
in  such  good  shape. 

The  basic  background  theorems  are  in  [6]  where  some  "continuous  parameter" 


applications  are  given.  That  reference  emphasizes  the  continuous 
parameter  case.  But  the  method  is  often  easier  to  use  and  can 
handle  many  types  of  interesting  problems  in  the  discrete  parameter 
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case,  and  here  we  show  how  to  effectively  use  it  for  a  broad  class  of 

problems.  The  method  of  proof  is  interesting  in  itself  and  can  be 

adapted  to  special  cases  when  our  assumptions  do  not  hold.  For  each 

e  >  0,  let  be  an  increasing  sequence  of  o-algebras  which  measures 

{X  (s),  s  <  t}  and  Et  the  corresponding  conditional  expectation  operator. 

Write  Ene  *  En>  Let  denote  the  measurable  functions  f(*)  (of 

(<*>,t))  which  are  constant  on  the  [ne,ne+e)  intervals  and 

measurable  at  time  t,  and  satisfy  sup  E|f(t)|  <~.  Let  fe(*)  e 
— e  e  t>0  f 

£  .  1^  sup  E | f  (t) |  <  »  and  lim  E|f  (t) |  =  0,  for  each  t,  we  say 

t>0  e-*-0 

e>0 

f following  the  terminology  in  [3],  [6])  that  p-lim  fe(*)  *  0.  Define 

e-*-0 

A  on  by 

AEf(t)  =  [E^f (t+e)  -  f(t)]/e. 

<>£  •  .  . 

A  is  an  approximation  "in  some  sense"  of  the  weak  infinitesimal  operator  of  the 
limit  process. 

Reference  [3]  contains  a  very  interesting  method  for  the 
continuous  parameter  case,  with  some  remarks  on  how  it  might  be  used 
for  the  discrete  parameter  case.  The  method  here  seems  easier  to  use 
and  it  is  easier  to  construct  the  perturbation  {fe(*)}  with  out  method. 
Some  of  our  results  were  strongly  motivated  by  the  techniques  in  [3]. 
Section  2  contains  some  assumptions  on  the  limit  process,  and 

a  sequence  of  truncated  processes  is  introduced  in  Section  3. 

The  use  of  these  truncated  processes  will  facilitate  the  tightness  proof, 
and  i.  lows  us  to  work  only  with  X£(*)  and  the  limit  until  the  first 
escape  time  from  an  abritrary  but  bounded  region.  The  general 
background  limit  theorem  and  the  tightness  theorem  from  [6]  are  stated 
in  Section  4  in  the  form  which  will  be  most  useful  to  us.  In  Section  5, 
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specific  assumptions  for  (1.1)  are  given  when  (S^)  does  not  depend 
on  {X^} ,  and  the  theorems  of  Section  4  are  applied  to  (1.1)  in 
Section  6.  Sections  7  and  8  illustrate  the  general  method  with  two 
applications.  The  modifications  when  {£n)  is  (^n)  dependent  are 
discussed  in  Section  9,  and  an  application  to  the  rate  of  convergence 
problem  for  a  stochastic  approximation  with  "averaged"  observations 

appears  in  Section  10.  From  a  notational  point  of  view,  it  is  much  sinpler  to 
treat  the  case  of  non-state-dependent  noise  first. 


2.  Assumptions  on  the  Limit  Process 
Some  assumptions  on  (what  will  be)  the  limit  process  is  required. 

A 

Let  "£  denote  the  real  valued  functions  on  R+  *  Rr  which  are 

A  A 

zero  at  “,  the  subset  with  compact  support,  and  the 

subset  whose  mixed  partial  (t,x)  derivatives  up  to  orders  (a,g) 

2 

are  continuous.  Let  A  =  I  b^.t)^-  +  |  E  . ai j  (x,t)  denote 

i  i  l » 3  i  J 

a  diffusion  operator  with  continuous  coefficients. 

Next,  an  existence  and  uniqueness  condition  is  needed.  Dr[0,») 

denotes  the  usual  space  [7]  of  Rr  valued  functions  which  have  left 

hand  limits  and  are  right  continuous  and  with  the  Skorokhod  topology. 

Let  x(*)  denote  the  generic  element  of  Dr[0,»).  For  each  x  e  Rr, 

we  assume  that  there  is  measure  Px  on  Dr[0,«)  such  that 

Px(x(0)  *  x)  *  1  and 

(2.1)  P  {sup|x(t)|  <  •}  ■  1,  each  T  <  «, 

x  t<T 

and  which  is  the  unique  solution  to  the  martingale  problem  of  Strook 
and  Varadhan;  namely,  for  each  f(-,-)  e  ^J'2  and  x  e  Rr,  the 
Mf(')  below  is  a  P^  martingale: 
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(2.2)  Mf(t)  -  f  (x(t)  ,t)  -  f  (x(0)  ,0)  -  +  A)f  (x(s)  ,s)ds  . 

We  work  on  Dr[0,«“)  rather  than  on  Cr[0,»)  because  it  is  easier 
to  prove  tightness  on  the  former  space.  The  measure  Px  is  con¬ 
centrated  on  the  subset  of  Dr[0,“)  of  continuous  functions,  in  any 
case.  When  the  measure  is  given,  the  corresponding  process  solution 
to  the  martingale  problem  will  be  written  as  X(»)  in  order  to 
distinguish  it  from  the  generic  element  x(*)*  We  still  have  existence 
and  uniqueness  if  the  initial  value  x  is  replaced  by  a  random  variable 
X(0).  Below  X(0)  will  be  the  weak  limit  of  {Xq}. 

3.  Truncated  Processes 

The  idea  of  the  proof  in  [6]  is  to  first  prove  tightness  and 

then  to  show  that  all  weak  limits  solve  the  same  martingale  problem 

whose  process  solution  X ( * )  is  unique  (in  the  sense  of  measure,  of 

course).  To  facilitate  the  proof  of  tightness,  it  is  convenient  to 

£ 

bound  the  random  functions  X  (•)  and  X(*)  by  altering  them  after 
they  first  leave  the  sphere  SN  »  {x:  |x|  <  N}  and  stopping  them 
after  first  exit  from  S^+j.  It  is  then  proved  that  for  each  N  the 
sequence  of  X£(«)  before  first  exit  from  converges  weakly  to  the 

part  of  the  diffusion  X(*)  before  first  exit  from  SN.  Finally,  the 
uniqueness  and  (2.1)  (i.e.,  infinite  escape  time  for  the  diffusion)  is 
used  to  get  that  the  unaltered  Xe(«)  converge  weakly  to  X(*)  as 
desired.  Let  q^O)  denote  a  continuous  function  which  takes  the  value 
unity  on  S^,  zero  on  Rr  -  SN+j,  has  values  in  [0,1]  and  first  and  second 


derivatives  uniformly  bounded  in  x,N.  It  is  convenient  to  write 
(1.1)  in  the  slightly  expanded  form 


(3‘ 1J  Cl  '  Xn  *  tK^n>  *  *  *  o(e)k£(X^,^), 


where  conditions  on  the  functions  will  be  given  below. 

Let  the  subscript  N  denote  multiplication  by  q^( • ) ;  i.e., 

*)  "  ^e(*)q^(*)»  etc.  For  each  N,  we  define  the  truncated  or 
£  N 

altered  process  X  *  by 


(3.2)  *  *e>nO  .  K  gE>N(xJ-N.^ 

*ei;e,N^,NC 


The  sequence  {X^,N}  equals  {X^}  until  at  least  the  first 

time  that  the  latter  exits  from  S^. 

Let  AN  denote  a  diffusion  operator  of  the  form  of  A  in 

Section  3  and  whose  coefficients  aM(*,*)  and  bM(*,*)  are  continuous 
/to  IN  N 

and  equal  a(*,*)  and  b(*,*)»  resp.,  in  S^.  Suppose  that  a  process 

N 

X  (•)  solves  (not  necessarily  uniquely)  the  martingale  problem 
corresponding  to  operator  AN  and  (perhaps  random)  initial  condition 
XN(0).  If  XN(0)  -*■  X(0)  weakly,  then  we  call  XN(*)  an 
N- truncation  of  X(')»  the  unique  solution  of  the  martingale  problem 
for  initial  condition  X(0)  and  with  operator  A.  The  terms 
A  and  are  defined  analogously  to  Ae, 

etc.,  but  for  the  process  Xe,N()  instead  of  the  process  Xe(*), 
where  Xe,N(*)  is  the  piecewise  constant  (on  the  [nefne+e)  intervals)  interpolation 

of  {X^,N};  in  particular  X^,N  a  Xe,N(ne). 
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.  The  Limit  Theorems 


Theorems  1  and  2  are  taken  from  [6] ,  but  are  rephrased  for  the 
convenience  of  the  class  of  problems  dealt  with  here.  In  the  follow¬ 
ing  sections,  they  will  be  applied  to  (1.1)  and  (3.1). 

£ 

In  Theorems  1  and  2,  {Xn,  n  >  0}  is  an  arbitrary  sequence 

for  each  e  >  0,  with  interpolation  denoted  by  Xe(*)  (Xe(t)  = 

X*  on  [ne,n£+e)).  It  need  not  be  defined  by  (1.1)  or  (3.1).  The 

<Xn’  )  is  any  sequence  such  that  XR»  «  XR  until  at  least  the 

first  exit  time  of  {X^}  from  S^.  Part  of  our  basic  structure  (e.g., 
/\  £  £  M 

use  of  A  and  the  perturbed  f  'CO  below)  is  motivated  by  that  of 
Kurtz  [3],  The  proofs  are  much  different  and  do  not  require  the 
semigroup  machinery  of  [3]  and  its  predecessors.  An  extensive 
development  of  a  martingale  approach  to  limit  theorems  for  a  class  of 
continuous  parameter  Markov  processes  appears  in  [12]. 


Theorem  1.  Assume  the  conditions  of  Sections  2  and  3  on  the 


martingale  problem  with  operator  A,  and  on  aN(-,*)  and  bN(-,«)- 
Let  Xq  -*■  XQ  weakly  as  e  -►  0 .  For  each  N  and  f  ( • ,  • )  e  %ft  a  dense 
set  in  A.  let  there  be  a  sequence  {fe'^(.)>,  where  fe,N(*)  e 


and  such  that 


[4. 1) 


;4.2) 


p-lim  [fe'N(-)  -  f(Xe'V).-)3  -  0 

e-*-0 

p-lim  [Ae'Nf6»N(0  -  (4r  ♦  AN)f(Xe»N(.),*)]  *  0. 
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if  {Xe,N(*)»  e  >  0}  is  tight  in 

Y  g 

D  fpjL_each  N,X  (•)  converges  veakly  to  X(-),  the  unique 

solution  to  the  martingale  problem  with  initial  condition  X(0)  =  XQ. 

A 

^0  denote  the  space  of  real  valued  continuous  functions 
on  Rr,  with  compact  support. 


Theorem  2.  For  each  T  < 


and  N  <  «  suppose  that 


(*•3)  lim  lim  P{sup| Xe,N(t) |  >  K}  =  0  . 

K-h»  e>0  t<T 

For  each  N  and  each  £(•)  e  a  dense  set  in  CQ,  let  fe,N(* 

be  a  sequence  in  ^£,N  such  that  (4.4)-(4.6)  hold. 


(4.4) 


lim  P{sup| fe’N(t) 
£-►0  t<T 


f(Xe,N(t))|  >  «}  =  o,  each  a  >  o,  T< 


For  each  T  <  let  there  be  a  random  variable  M£,N(f)  >  o  such 
that 

(4.5)  lim  sup  P{M^,N(f)  >  K)  «=  0 

K-**»  e>o  1  *  ' 

where 

(4.6)  sup|A£,Nf£»N(t) |  <  M^'N(f). 

t<T  ‘  1 

Then  {f (Xe,N(.))> ,  each  £(•)  e  CQ,  and  {Xe,N(-)>  are  tight  in 
Dr [0 ,“) . 


The  usefulness  and  relative  ease  of  application  of  Theorems  1 
and  2  will  become  apparent  in  the  following  sections. 


5.  Assumptions  for  (3.1) 


Two  forms  will  be  dealt  with.  The  first  allows  h£,  etc.  to 
have  a  rather  arbitrary  dependence  on  £,  but  uses 

(Ala)  For  each  e,  is  stationary,  bounded,  <p  -mixing 

1/2 

(see  [7])  uniformly  in  e,  with  mixing  rate  satisfying  <  «. 

£ 

In  many  applications,  it  is  desired  to  have  (5n>  unbounded  (say 
Gaussian).  Under  additional  restrictions  on  h£,g£,k£,  this  can 

be  treated  for  particular  cases  of  {£*},  We  also  treat  the  case,  q 
important  in  practice^  where  (Alb)  holds  in  lieu  of  (Ala) . 

(Alb)  Let  the  functions  in  (3.1)  take  the  forms  g£(x,C)  = 
ge(x)S,  h£(x,S)  =  he(x)5,  and  for  some  <*  >  0,  c  >  0,  k£(x,5)  = 

r  1  +01 

k£(x)0(|£|  +  1)  and  the  o(e)  coefficient  of  ke  is  e 

Let  there  be  a  matrix  L  and  a  vector  valued  stationary  Markov 

process  {S^}  such  that  E^n  -  0,  all  moments  are  finite  and 

=  L?n‘  {r4)  be  such  that  >  |  |  E^JSq]  |  -  RJ^< 

and  I  1 R  | 1  / 2  <  oo.  Suppose  that  there  are  {P.}  such  that 
t  * 

l  pj^2  <  ®  and 

(5.1)  lE<VilV  '  E(Vpl  *  V1  +  l^0l2)- 

For  each  a  >  0,  there  is  a  8  >  0  such  that  E C I | 01 1 )  < 
(constant) (|Tq| ^  + 1) . 

Remarks  on  (Ala,b)  follow  (A8) . 

When  Case  (Alb)  holds,  the  k£(x,5),  etc.,  is  to  be  replaced  by 
k£(x),  etc.,  in  (A2)  and  (A5)  below. 


(A2)  ke(*,*)  is  measurable,  and  uniformly  bounded  on  bounded 

x-  sets .  h£(  • ,  *  ),ggC  • » • )  and  Kgf*)  are  measurable  and  continuous 

_in  x  for  each  £ .  They  are  bounded  on  bounded  x-sets  uniformly 
in  e,5 . 

(A3)  There  is  a  continuous  K(-)  such  that  K£(-)  ->•  K(-) 
uniformly  on  bounded  x-sets. 

(A4)  Eh£(x,S^)  =  0,  each  n,e,x. 

(A5)  ge(’»5)  and  h£(*,5)  are  (twice,  once,  resp.)  continuously 

differentiable  for  each  5,e;  the  derivatives  are  bounded  on  bounded 
x-sets.  uniformly  in  5,e. 

(A6)  There  is  a  continuous  function  G0(«)  such  that 
Ege(x,50e)g'£(x,S0e)  -G0(x) 

uniformly  on  bounded  x-sets.  There  are  continuous  (and  symmetric 
w-l-o-g)  aij(’)  such  that  for  each  f(-,«)  eOSj»2, 


1  c  00 

7  B*ilx.«0>fxx(x>t)Mx*«0)  *  I1E8iU.«j)fxx(x,t)ge(x,?J) 


7  1I,aij(xJfx.x.Cx-tJ 
-1  »  J  1  J 


uniformly  on  bounded  x-sets.  Thus,  the  second  term  on  the  left  also 
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converges  uniformly  to  the  above  right  hand  side 


minus  j  trace  GQ(x)fxx(x,t) 


(A7)  There  is  a  continuous  g(0  such  that 


JlEge>x(x,^)ge(X,5o)  -  g(x) 


uniformly  on  bounded  x-sets  as  e  -*■  0 ,  where 


ge,x(x’^ 


gl,Xl(x,5) . gl,x  (x’5) 

ST  x  (x>£)»»,,»g  (x>£) 

1  r,xr 


(A8)  There  is  a  unique  solution  (for  each  initial  condition] 
to  the  martingale  problem' In  '  DrfO,«Q  "with  operator 


A  =  j  1  ai i Cx) 
i ,3=1  3 


,  yx  +  i  [^(x)  +  g±  cx)  ] -53^—  » 
ci9xj  i=l  1  1  dXi 


where  K^(*)  and  g^(*)  are  the  i*^1  components  of  the  vectors  K(-) 
and  g( • ) ,  resp. 

Remarks  on  (Ala,b).  We  note  first  that  (5.1)  holds  if  {Tn>  is 

a  zero  mean  stationary  Gaussian  process  with  correlation  eS^T^  =  **£ 

and  l  iRol1^2  <  "•  For  notational  convenience,  we  show  it  in  the  unit 
i 

variance  and  scalar  case.  We  can  write  5^  *  R^q  +  w^ere 

and  are  zero  mean  and  independent.  The  sentence  below  (5.1) 

obviously  holds  here.  Also,  equation  (5,1)  is  implied  by  the 
2  2  ~2 

calculations  *  1  -  R^  +  E£  ^ , 


] 


r 

t. 


j- 

*■ 


I  • 


1. 


i: 


i: 

i. 

i: 

i: 

[ 

[ 


It  can  be  seen  from  the  method  of  proof  that  the  theorem  would 

remain  valid  under  weaker  conditions  than  (Ala)  (if  the  case  (Alb) 

does  not  hold).  Then  the  special  structure  of  the  functions  in  (3.1) 

would  need  to  be  taken  into  account.  The  proof  is  given  for  the  broad 

and  standard  cases  (Ala,b),  but  its  general  outline  would  be  followed 

For  example, 

in  other  cases.  •  there  are  important  examples  in  communication  theory 
(for  example)  where  a  system  processes  some  wide-band  and  unbounded 
noise  input  non- linearly .  Somewhat  analogous  "continuous  parameters" 
techniques  have  been  developed  [8]  to  deal  with  a  number  of  these  importan 
but  special  cases.  Presumably,  similar  results  are  possible  in  the 
discrete  parameter  case. 

In  the  state-dependent  noise  case  of  Section  9,  (Al)  is  not 
used,  and  is  replaced  by  a  longer  list  of  the  (weaker)  specific 
conditions  which  are  actually  used  in  the  proof  (of  either 
Theorems  3  or  4).  See  also  the  comments  on  the  Markov  case  in 
Section  9,  where  the  conditions  on  the  continuity  and  differentiability 
of  the  functions  are  weakened.  We  note  also  that  the  proofs  can 
readily  be  adapted  to  the  traditional  stochastic  approximation  case 
where  e  is  replaced  by  a  sequence  tenK  with  en  >  0  used  at 
iterate  n,  and  I  en  *  ® ,  eu  0, 

6.  The  Main  Convergence  Theorem 
The  same  (fe’N(.)}  will  be  used  to  satisfy  the  requirements  of  both 

A 

Theorems  1  and  2.  We  use  and,  given  f(*,*)  e  S, 


I 
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c  N 

construct  £  *  (•)  via  a  discrete  parameter  analog  of  the  method  used 
in  [5],  [6],  [9].  The  {X^}  and  (X^,N)  are  defined  by  (3.1)  and  (3.2)  resp. 
here. 


Theorem^.  Let  Xq  -►  XQ  weakly.  Under  (Ala  or  b)  and 
(A2)-(A8),  {X£( • ) }  is  tight  in  Dr[0,«)  and,  as  e  -►  0,  converges 
weakly  to  the  diffusion  determined  by  the  solution  to  the  martingale 
problem  with  the  operator  A  (A8),  and  initial  condition 
X0  =  X(0). 


f 


t 


I 

I 

I 

I 

I 

I 

I 

I 

1 

1 

1 

1 

1 

1 

I 

1 

I 

I 
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Proof .  Part  1.  Fix  f(.,.)  e  and  fix  N  throughout  the  proof. 

*  (•))  satisfying  (4.1),  (4.2)  will  be  found.  In  order  to 
simplify  the  notation  in  the  proof  the  superscript  N  on  Ae,N  and 
—  *n*N»  Xe,N(*)  and  XN(.)  and  the  subscript  N  on  h£  N,  etc. . 

henceforth  be  dropped  (in  the  proof  only) ,  but  we  always  work  here 
xn’  »he  m*  etc. ,  only,  and  not  with  the  original  X  and  h_,  etc. 
The  proof  under  (Ala)  will  be  given  first.  The  simple  modifications 
required  under  (Alb)  will  then  be  stated. 

Thus  we  can  suppose  that  there  is  a  constant  such  that 

lXn^  -  KN*  l^n+1  *  Xn^  -  KN*^  an<*  t^iat  |h£l»  |^e|  »  etc.,  (definitions 
(Ala  or  Alb)  are  bounded  above  by  K^. 

The  0( • ) >  o(-)  and  oi(-)  terms  (with  or  without  subscripts) 
are  uniform  in  all  variables  (except  N,  which  is  fixed  throughout) 
other  than  their  arguments,  unless  otherwise  stated.  Evaluate 

eA£f(X£.ne)  -  B£f(X£tl,ne.e)  -  f(X£,ne) 

-  -  fCX^.^ne)] 

*  E£[f(X£n,iU)  -  f(X£,nt)] 

(6.1) 

-  £EnVXn.l."£>  *  0(£)  *  f^Xn'n£)EnlEKE(Xn) 

*  A  *  £itXn-«n)) 

*  !  *  °l(£)- 
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Part  2.  Define  fe  -  f(X^,ne)  +  f^(ne)  +  f^fne),  where  the 

f^(-)  will  be  selected  so  that  (4.1)  and  (4.2)  hold.  We  now  define  fj()  in 

£ 

such  a  way  that  we  "average  out"  the  dependent  parts  of  (6.1).  Define  G£(x,t) 
ge(x,5)g£(x,C)  and  G£(x)  «  EG£(x,5).  Define  f^t)  to  be  constant 
on  each  [ne,ne+e)  and  satisfy  f^(ne)  »  f^(X^,ne),  where  f^(*,*) 
is  defined  by 

00 

ffCx.ne)  =  e  l  E*f'(x,en)h  (x,^) 

1  fc=»n  n  x  * 

*  *  Jn[E"  traCe  G£(x’tl)'fxx(x-nE)  ' 

-  trace  S£(x) fxx(x,ne) ] 

00 

♦  n  1  Enfi(x’en)Mx’^)  =  T1  +  T2  +  T3  • 

•2» — n 


By  the  ♦-mixing  and  the  facts  that  the  expectations  of  all  the 

summands  are  zero,  the  first  two  terms  are  0(e),  and  the  third 

0(/e),  uniformly  in  x.  Thus,  p-lim  ff(-)  =  0. 

-e  e  £+° 

Now  calculate  A  f  ^  ( • )  at  t  =  nt: 
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eA£fJ(„e)  .  -cE£fi(Xn£.en)i;eCX£,5£) 


I  Entrace  MCOf*X-nt>  * 


*  7  trace  Ce(Xn)fxx(Vn£) 


(6.2) 


-  fiCx».sn)i^X£.C£)] 

+  [a  term  similar  to  the  last  but  coming  from 
T^  instead  of  from  T^]  ♦ 


•  fi(Xn-Cn)8t(Xn-sl)l- 


The  first,  second  and  fourth  terms  in  (6.2)  cancel  terms  in 
(6.1)  (which  is  the  reason  for  constructing  f*(*)  as  we  did). 

The  S**1  and  6t*1  terms  are  o(e),  as  will  now  be  shown.  It  is 
easy  to  see  that  the  difference  between  these  terms  and  their 
values  with  en  ♦  e  replaced  by  en  in  the  f  (•,*)  is  o(e). 
The  rest  will  be  shown  for  the  5*^  term  only.  Define  He(x,en,S)  « 
f;(x,en)he(x,5).  Then  we  must  show  that 


(6.3)  ‘  I  -  Ht(x>.5£„  .  0(e). 
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By  (the  zeroeth  order)  Taylor's  formula  with  remainder,  (6.3)  equals 

VL  l‘EnHt,x«Xn*s(Xn*rXn’*£"-5tHCr5tnlds 


3/2 

which  equals  0(e  )  by  the  <j> -mixing  condition  and  the  fact  that 

|X^+1  -  x‘|  <  K^/e.  The  6**1  term  is  treated  similarly.  Thus, 


p-lim  (5*^  +  6**1  terms )/e  *  0. 


The  7t  and  last  term  remains.  Again,  the  7**1  term  differs  by 
o(e)  from  the  term  obtained  by  replacing  en  +  e  by  en  in  the 
fx (*,•),  apd  we  replace  en  +  e  by  en  there.  Write  Ke(x,en,£)  - 
^x(x,en)geCx,C) .  Then  by  applying  (the  first  order)  Taylor's  formula 
with  remainder,  we  can  write  the  7th  term  as 


(6.4)  o(e)  *^t_ZnHj[Ki  x(X=.en.^)(X^1-x')  . 

*  /n1(1-5)fxn.rx„e)'xe,xx^stx^i-0.e«.^)(x^rx^<*» 


The  contribution  of  the  "sum  of  the  integrals"  component  of  (6.4)  is 
0(e3^)  by  the  t-mixing  and  the  fact  that  E  K£  xx(x,en,S^)  *  0» 

Next,  by  collecting  the  terms  of  the  first  component  of  the  sum  in 

(6.4)  according  to  their  power  of  e,  we  can  write 


(6.S)  (6.4)  -  0(e)  ♦  Oj^X’^eCX^liSeWM- 


Part  3.  Next,  f2C*)  will  be  introduced  in  a  way  which  will 
"average  out"  the  second  term  on  the  right  side  of  (6.5).  Note  that, 
as  e  -*>  0  (and  ne  -►  t  in  f (•,•)) 


w 

I.E[f;(x.en)ge(x,«*)]^ge(x.5^) 


converges  uniformly  to  the  quantity  mentioned  in  the  last  sentence 

of  (A6)  plus  f^(x,t)g(x)  (g(0  is  defined  in  (A7)).  Recall  that 

we  have  suppressed  the  affixes  N,  and  that  the  ge(x,C),  etc.  used 

in  the  proof  is  actually  (in  terms  of  the  original  ge(x,C),  etc., 

ge(x,5)qN(x) ,  etc. 

£ 

Next,  f2( ')  is  to  be  chosen.  It  is  to  be  constant  on  each 
interval  [ne,ne+e)  and  satisfy  f2(ne)  -  f2(X^,ne),  where  f2(x,ne) 
is  defined  by 

f2(ne)  -  f2(X®,ne)  where  f2(x,ne)  is  defined  by 


f2(x,ne)  -  e  £  £  [E^(f’(x,ne)g  (x,ch)'ge(x,cf) 

1  t-n  j-t+1  n  x  3  x  x 

-  E(f^(x,ne)ge(x,5^))^ge(x,€^)]. 


By  the  ♦ -mixing  and  the  fact  that  the  centered  summand  has  zero 

expectation,  |f~(x,ne)|  -  0(e)  uniformly  in  x.  Thus,  p-lim  f~(*)  »  0, 
L  * 


Next,  evaluating  A  f2(ne)  and  using  the  stationarity  of  the  (Cj) 


where 


By  (Ala)  and  an  argument  using  Taylor's  formula  similar  to 
that  used  in  connection  with  (6.3),  we  get  that  the  last  term  of 
(6.6)  is  o(c).  Also,  the  first  term  of  (6.6)  cancels  the  second 
term  on  the  right  of  (6.4). 


Part  4.  Recall  the  definition  of 


given  in  Part  2 


Adding  the  results  of  Parts  1-3,  and  deleting  the  terms  of  A  f  (•) 
which  cancel,  yields  (modulo  the  qN ( • )  factors  of  Ke,  etc.) 
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! 


Then,  by  the  convergences  (A3),  (A6),  and  (A7)  all  of  (4.4),  (4.5) 
and  (4.6)  hold  (the  M^»N(f)  are  bounded  here),  yielding  tightness;  also 
(4.1),  (4.2)  hold,  yielding  the  asserted  weak  convergence. 


P&rt  5.  Owing  to  the  special  form  of  the  functions  under  (Alb) , 
essentially  the  same  proof  can  be  used.  The  only  problem  is  that  the 
0( • )  and  o(*)  terms  are  not  uniform  in  Cn_1>  Let  us  examine  some 
typical  terms.  Let  En  denote  conditioning  on  ,  i  <  n.  All  the 
O(-)  below  are  uniform  in  all  variables  other  than  their  argument. 

The  essential  part  of  the  last  term  in  the  definition  of  f^(en)  is 
(the  fx  term  is  omitted) 


*  I  E®geCx.ef)l  -  !✓£  ge(x)  I  LbMI  5 
t-n  n  *  6  £»n  n  * 


Next,  examine  the  expression  below  (6.3).  For  some  c^  >  0,  we  can  show 
that  it  is  bounded  above  by 


-  0(c5/2)B‘|Cn|Cl*|Tn|Cl)  l  |Rt|. 


Also,  by  a  careful  use  of  iterated  conditional  expectations  and  (5.1), 


we  can  show  that 
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|f|(ne)|  <  0(e)(l+|£n_1|2). 

In  fact,  it  can  readily  be  shown  that  |ff(ne)|  and  all  the  o(e)/e 
terms  in  A  f  (ne)  which  appeared  in  Parts  1  to  4  are  of  the  form 

6  (1+|  ^n-il  )  for  some  y  >  0  and  <S  >  0.  This,  together  with  the 
fact  that  all  moments  exist  implies  (.4.1)- (4.2). 

Next,  note  that  for  each  m  >  1, 

P{  sup^  eYKn|6  >  u}  <|  P(eY|?n|6  >  u} 

TEk  |6meYm 

^  _ Ln1 _ 


By  letting  ym  >  1,  (4.4)  to  C4.6)  hold.  Q.E.D. 


7.  Application  to  a  Problem  of  Guess  and  Gillespie  [1] 
In  [1],  the  scalar  problem  (7.1)  is  treated 

<7-l>  *Jf.l  ■  ♦  Cexp  gCS‘)  -  l)X^,  Xj-X0, 

under  two  different  sets  of  conditions.  The  first  being  that 
for  some  constants  u,  <  0  and  (rn>  such  that  I  rn  < 
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ES^  =  ye,  var  -  o2e,  Cov^sjj)  =  eo2rn  and  {  (S^-ES^/Ze  } 

£ 

satisfying  the  condition  on  {Sn>  in  (Ala).  Under  their  second 
set  of  conditions  the  (S^)  are  Gaussian,  hence  unbounded.  Our 
method  works  in  either  case.  We  stick  to  the  first  case.  Note 
that  is; |  5  K/e  for  some  real  K.  Let  f(*)»g(*)  have  continuous 
third  derivatives  with  f(0)  *  g(0)  »  0,  and  put  (7.1)  into  the 
form  (3.1)  by  expanding  exp  g(*)  and  f(*)-  First  write 


(7*2>  Xn*l 


Xn  +  Ef^Sn)  +  E(exp  8CS;+1)  -  1)4  ♦ 

+  ^Sn>  -  Ef(S^)) 

/e 

+  ~  lj  '  E(exp  g^l3 

✓e 


D]  ,  e 

-  ]X  • 


Expanding  g(*)  and  f(.)  yields  (the  o(*)  are  uniform  in  all 
variables  but  their  argument). 

(7.3)  X;+1  -  Xn£  ♦  e[fs(0)y.*  \  fss(0)a2  ♦  ug^O^  ♦ 

\  Cgss(0)  ♦  g^(0))a2x;]  ♦ 

(sf-ES*)  (S£-ES£)Xe 

♦  /e  |£_  (0) — S - 2_  +  g,(0)-5 - 2 — H]  + 

5  /e  s  /e 

i  . 


2  ((S*)2  -  E(SV)X* 

♦  (gss(0)  ♦  g2(0)) - 2 - - - n - n 


]  +  o(e). 


5  K  *  cK‘Xn>  *  *  *  «£«0£.O  *  °CE). 
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where  5n>  IT(*)>ge(*)  and  h£(*)  are  defined  in  the  obvious  manner. 

Equation  (7.3)  is  of  the  form  (3.1)  and  the  conditions  of  Theorem  3 

hold.  An  immediate  application  of  Theorem  3  yields  that  that  the 
£ 

(X  (•)>  are  tight  and  that  the  weak  limit  is  the  unique  diffusion 
with  the  operator  (the  same  result  as  [1]) 


2 

A  -  [a1+a2x]  ^  +  \  (b1+b2x)2  i-y 

al  =  Mfs(0)  +  I  °2[(r'2-l)fs(0)gs(0)  +  fss(0)] 

a2  3  Mgs(0)  +  \  a2tn2gs2(0)  +  gss (0) ] 

bi  “  anfs(°)»  b2  *  °ngs(0) 

2  00 
n  =  1  +  2  l  r.. 


Since  our  method  aoes  not  require  an  explicit  construction  of  the 
solution  process  (as  essentially  done  in  the  proof  in  [1]),  it  is 
equally  applicable  to  vector- valued  versions  of  (7.1),  and  seems  to  give 
a  bit  more  insight  into  the  approximation  process  and  the  effects  of 
variations  in  the  data. 

8.  Rate  of  Convergence  for  a  Stochastic  Approximation 
Stochastic  approximations  of  the  form 


(8.1) 


Xn+1  3  Xn  +  eh<Xn>  +  egCXn’5n> 
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have  numerous  applications  in  sequential  monte-carlo  optimization  in 
control  theory.  In  [10],  [11],  convergence  and  rate  of  convergence 
is  treated  via  a  rather  direct  method.  We  use  the  notation  of  this 
paper  rather  than  that  of  [10],  [11].  Let  h(*)  and  g(*,*)  satisfy 
(AS)  and  assume  Eg(x,£^)  =  0.  Let  x(t)  =  6  denote  a  globally 
asymptotically  stable  solution  to  i  =  h(x)  (which  we  assume  exists) . 
Thus,  h(0)  »  0.  The  properties  of  {X^}  for  large  n  and  small  e 
are  of  interest.  To  invesitgate  them,  define  =  (X^-0)//e.  in 
[11],  it  is  shown  under  appropriate  conditions  that  there  is  an  Ne  -*■  00 
as  e  0  (unless  g(x,£)  *  g(x)£,  in  which  case  the  process  can  be 
centered  so  that  N£  =  0)  such  that  (U^,  n  >  N£,  e  >  0}  is  tight. 

Let  us  assume  this  tightness  here  and  study  the  asymptotics  of  U6(*) 
where  Ue(t)  =  U*  on  [ne,ne+e).  The  "tail"  of  U^f-)  for  small  e  contains 
the  rate  of  convergence  information  for  {X^},  for  small  e.  By  the 
assumed  tightness 

lim  ITm  P{|xf-0|  >  6}  =  0,  each  6  >  0. 

s-0  n-*-«>  n 


Now,  by  (8.1)  we  can  write 

K>-2>  U^1  ’  un  *  el»x(9)  * 

♦  *  e3/2tyG(X*),  U*)U*, 

where  Bq(*,*)  is  a  matrix  valued  bilinear  form,  G(>)  is  a  function 
which  is  bounded  on  bounded  x-sets  and  X*  e  [xf,X*V,].  Next  fix  a 
weakly  convergent  subsequence  of  (ujj  }  with  limit  UQ.  Then  under 
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suitable  conditions  on  {g(0,Sn)},  Theorem  3  and  the  truncation  method 
can  be  applied  to  get  Theorem  2  of  [10];  in  particular,  U(‘)»  the  weak 
limit  of  {Ue ( • ) >  solves 

C8.3)  dU  -  hx(9)Udt  +  R1/2dB,  U(0)  -  UQ 

00 

where  R  ■  lim  X  Eg(9  .S^hg' (0  ,5^)  and  B(*)  is  a  standard  Wiener 
e+0  -«  u  1 

process.  This  is  one  of  the  results  of  [11]  and. was  proved  there  by 
solving  (8.2)  (modulo  the  Bg-terms)  and  essentially  constructing  the 
limit  via  weak  convergence  theory,  a  more  arduous  task  than  merely  using 
Theorem  3  (although  the  proof  in  [11]  has  its  own  intrinsic  interest, 
and  the  paper  contains  other  interesting  results) . 

If  N£  is  chosen  such  that  eN£  ■*  <*>  as  e  -*•  0,  then  the  weak 
limit  U(*)  of  the  original  {U  (•)}  is  the  stationary  solution  to 
(8.3).  The  technique  can  also  be  applied  to  the  rate  of  convergence 
problem  for  stochastic  approximations  where  the  e  in  (8.1)  is  replaced 
by  £n  and  en  ^  0  as  n  -*■  °»,  with  X  en  “  although  we  will  not 
pursue  this.  In  this  case,  the  interpolation  intervals  would  be  cn, 
rather  than  the  constant  e. 

9.  State  Dependent  Noise 

Under  broad  conditions,  the  treatment  for  state  dependent 
noise  is  very  similar  to  that  given  in  Theorem  3,  and  only  two  cases 
will  be  discussed.  For  each  e  >  o,  denotes  a  stationary 

bounded  sequence.  If  the  are  mutually  independent,  then  the 
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£  £ 

{Sn_2  »Xn>  below  are  Markovian  and,  as  will  be  seen,  the  smoothness 
of  Qe  in  (A9)  below  can  then  be  weakened.  The  scheme  introduced 
here  covers  some  interesting  cases,  but  it  should  be  seen  as  being 
typical  of  the  possibilities,  as  indicated  by  the  example  in 
Section  10,  which  uses  a  slightly  different  setup.  We  will  need: 

(A9)  There  is  a  function  Qe (•,♦,*)  such  that  * 

£  £  £ 

Qe(*t»^t-l»^i)  wbere  Q  has  continuous  (uniformly  in  ^-in 

bounded  (x,£)  sets)  partial  (x,€)  derivatives  up  to  second  order. 

For  each  sphere  SN  there  is  a  sphere  SN  ^  such  that  remains 

£ 

in  Sjy  ^  as  long  as  remains  in  S^. 


It  is  now  convenient  to  introduce  some  auxiliary  processes 
which  will  be  used  in  the  construction  of  the  ff(*)»  For  each  n, 
define  the  processes  Pn(x)  -  U^(x)  ,Z* ^(x)  xxCx),  1  >  n)  as 
follows: 


(9.1)  «!>x W 

5t,xxW 


collection  of  second  partial  derivatives  of  the 
components  of  £^(x). 


The  initial  conditions  on  Pn(x)  will  be  given  below,  when  f ^  and 
fj  are  defined. 


(A10)  For  each  x,  there  is  a  unique  stationar 
P(x)  *  {£"«00  .  „(x)  (x)  ,  ®  >  l  >  -»}  satisf' 


uniformly  in  bounded  x-sets 


second  (res 


third)  member  is  the  x-derivative  of  the  first  (res 


second)  member 


Dome  sore  or  mixing  condition  on  Pn(x)  is  needed,  as  is  some 
condition  on  the  rate  of  convergence  of  the  distributions  of  the 
"tails"  of  pn(x)  to  those  of  the  stationary  process  F(x) .  A 
general  type  of  mixing  condition  (as  in  (Al))  can  be  used,  but  we 
prefer  here  to  introduce  the  conditions  in  the  weakest  and  most 
explicit  form  (All),  (A12)  that  we  can  use.  This  is  because,  owing 
to  the  double  requirement  iust  mentioned,  it  eimni»  «... 


uus  inere  are  "«ny  sets  or  sufficient  conditions  which  inply  (A11)-(A12) , 

and  there  seems  little  point  in  restricting  the  applicability  of  the  result.  Note 
that  (A12)  is  the  weakest  condition  which  is  usable  in  Theorem  3  in  place  of  (Al) 
the  non-state-dependent  noise  case.  Below,  the  subscript  x  denotes  a  total 
derivative;  i.e..  for  a  smooth  function  afO. 


Conditions  (All)  and  (A12)  can  be  combined,  but  it  is  perhaps  more 
natural  (for  purposes  of  verification)  to  keep  them  as  they  are. 
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(All)  concerns  the  rate  of  convergence  of  the  distributions  of 
Pn(x)  to  those  of  P(x)  for  "non- stationary"  initial  conditions  for 
Pn(x),  while  (A12)  is  more  of  a  mixing  condition,  concerning  the  rate 
of  convergence  [B‘qKltn(x) .«t.n>x00 .5t*n>xxW)  - 

EqC^£+nCx) *^£+n,x^x^  *^A+n,xx^x^  ^  *  0  for  smooth  qC*)»  as  l  ♦  •. 

A  ao  oo 

(All)  For  each  f(-,-)  e  '  t*ie  sums  (9*2),  (9.3)  converge 

(absolutely)  uniformly  in  x,t,n  and  in  the  initial  conditions 
Cf(x),cf  _(x)  in  bounded  sets.  Define 


|  I 

L, 


I 

I 
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[ 

i: 

t 

i 

c 

[ 


H*C*>  «  Eg'(x,C^(x))fxx(x,t)geCx,C^(x)) 

Cx) )  fxx(x,t)g£(x,£^  ^x)  ^ 

*  ECg'Cx,5je(x))£xCx,t))^geCx,^(x)) 

-  E(g;(x,lf(x))£xCx,t))^geCx,C®(x)). 

C9-2)  I  Hj(x),  I  (Hfcxn 

Jl-n  x  £*n  z  x 


(9.3) 


l  1 4  ,(x), 

■A+l  J*x 


l  (i  ,(xn 

•£+l  J >*  x 


e  e 

Let  En  denote  conditioning  on  ♦  j,  i  <  n,  and  on  the  initial 
condition  o£  Pn(x),  namely,  (€^(x)  ,€*>x(x)  ,5*  xx(x))  .  Owing  to  the 
stationarity  of  {<|»A>  we  could  set  n  =  0  in  (All). 

(A12)  Define  (j  >  *  >  n) 


•»‘Cx)  ’  V;Cx,t)Mx.c£(x)) 

Kit (*)  -  E^(x,{‘(x))fxx(x,t)*t(x,5x(x)) 

-  Eg^(x>5t(x))£Jtx(x,t)gcCx,t^(x)) 

-  B‘fi(x,t)ge(x,«'(x)) 

"i.t00  * 

-  Eu;cx,{j'cx))fxcx,t))ig£(x,5'cx)). 


The  sums  (9-4*5, 6)  below  converge  (absolutely)  uniformly  in  <*>,t,x,n 


and  initial  conditions  for  P_(x)  in  bounded  sets,  for  each 
—  ■—  '  - -  n  ■  -  1 - 

*'op  00 

f(-)  e  x0»  . 

ao  «* 

(9.4)  £  J^(x),  l  (Jj^Cx)^,  and  also  for  K,L  replacing  J. 

t*n  A*n 


(9.5) 


l  (L# (x) ), 
i**n 


(9.6)  l  I  M'.(x),  l  l  (M*  #(x))  . 


lmn  j»t+l 


i  a  *  L  L  V.™.;  a  i 

t*n  j-t+1 


Remark .  (A11)-(A12)  hold  if  {^}  is  mutually  independent, 

A* 

he  and  g  are  multiplicative  in  €  and  there  are  |<*|  <1  and 
twice  continuously  differentiable  (in  x)b(*,*)  such  that 


Ci  -  *  "(C, »(,.♦*)  *  o. 

A  Note  on  the  Markov  Case.  For  the  case  where  are 

-  n 

e  e 

mutually  independent  for  each  e  >  0,  {xn»5n.1>  is  a  Markov  process 

if  Qe  is  merely  a  Borel  function.  Then  E*  in  (A12)  and  in  (9.7), 

£  £ 

(9.8)  below  can  be  replaced  by  conditioning  an  Xn>€n_i  only. 

£  £ 

Instead  of  differentiating  the  functions  in  the  sums  in  f^  and  f2 
with  respect  to  x,  as  done  in  the  proof  of  Theorem  3,  it  is  only 

£  g 

necessary  that  the  Enf^(x,  en)he(x,5t(x)) , 

Egg(x, C|(x))fxx(x,t)ge(x, €^(x)) ,  etc.,  be  differentiated.  Often  the 
latter  functions  are  smooth  functions  of  x,  uniformly  for  in 

bounded  sets  -  even  for  Qe  not  satisfying  the  smoothness 
required  by  (A9).  In  that  case,  the  proof  of  Theorem  3  can  be  carried 
out-but  with  derivatives  of  the  conditional  expectations 


and  expectations  of  the  functions  being  taken  in  lieu  of  the 

derivatives  of  the  functions  themselves.  Of  course,  the  same  can  be 

said  for  the  case  of  Theorem  3,  if  {iE,XE}  is  Markov  there. 

n  n 

To  be  more  specific,  In  (9. 2) -(9.6)  replace  (with  the  exception 

of  the  I*  (x)  and  M* (x)  terms)  the  E(*)  and  Ee(-)  by 
j**  j*  x  n  x 

(EC-))£  and  (EE(-))X,  resP-  Replace  Mj  %  (x)  and  i!  £  (x)  by, 
resp. , 

Mj,t(x)  *  EnCEIgeCx»5j(x))fx(x't))i8e^,5t(x))] 
-E[CEjg^(x,5jf(x))fx(x,t))^geCx,^(x))], 

*  ElOBjg^x.^CxJJfxCx.t))^^,?^))] 

-  E[CE|g,eCx,Ijf(x))fx(x,t))^geCx,fJ(x))], 

where  e£  denotes  conditioning  on  the  Markov  state  at  time  !, 
whatever  the  process.  Finally,  replace  the  summand  in  (A7)  by 

E(E0eg  e(x ,  TjCx) )  )xg  e(x  ,T0E(x)  ) . 

Then,  assuming  that  the  above  derivatives  and  conditional  expectations 
and  expectations  are  well  defined.  Theorem  4  continues  to  hold. 

Theorem  4.  Assume- CA2)»(A12) .  where  in  CA4).  (A6),  (A7) , 

(?J(x)  replaces  and  the  derivative  in  (A7)  is  a  total 
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derivative  .  Then  the  conclusion  of  Theorem  3  continues  to  hold. 


Proof.  The  proof  is  the  same  as  that  of  Theorem  3.  The  only 
change  is  that  we  use  the  following  for  the  ff.  First, 
fj(ne)  ■  fjCXn,ne)  where,  analogously  to  the  case  in  Theorem  3, 

fj(x,ne)  -  /e  £  E^(x,ne)g  (x,sf(x))  + 
j*n  J 


(9.7) 


+  '  JnEnfi (*.en)Mx.^Cx)) 


+  ^  ^n^Enge^x,^t^x^fxx^x,en^ge^x»5i(x^ 

-  Egg(x (x) )  fxx (x ,  en) g £ (x (x) )  ] , 

and  the  initial  condition  for  UfCX-),  l  >  n),  is  5e(X  )  *  5e. 

*  11  -  n  n  n 

Next,  we  use  f^Cne)  -  f„(X*,ne),  where 


(9.8) 


F2e(x,ne)  -  e  ^  jj^I^Cf^x.nOg.Cx.C^x)))^^,^^,)) 
-  E(f^(x,ne)ge(x,rf(x)))^ge(x,^(x))) 


and  with  initial  conditions 


gc,xCx»?fCx))  in  (A7)  is  replaced  by  ge>x(x,rj(x) ) 

♦  get5(x,rj(x))r^x(x). 


The  forms  of  (5n(x)  ,Cn(x) }  and 
following  the  method  of  Theorem  3 


are  chosen  so  that 


we  get  the  correct  cancellations 
and  the  centering  about  the  correct  -  and  intuitively  reasonable  - 
"mean- values"  which  make  up  the  operator  of  the  limit  process. 


A  simple  but  interesting  example  will  be  given.  It  does  not 
quite  fit  the  format  of  Theorem  4,  but  can  be  handled  by  a  very 
similar  development,  and  it  suggests  some  interesting  possibilities, 
as  well  as  indicating  the  potential  power  of  the  general  method  of 
the  paper.  We  consider  a  very  simple  form  of  the  Kiefer-Wolfowitz 
stochastic  approximation  for  minimizing  a  function  q(x).  Here 
q(x)  «  2  x'Kx,  K  positive  definite  and  symmetric,  and  it  is  assumed 
that  the  derivative  plus  noise  is  observed,  and  the  coefficient 


sequence  is  constant;  i.e.,  the  iterate  sequence  is 


where  Yn  is  the  n  estimate  of  the  minimum  value  0  of  q(x) 
and  the  {4>n)  are  mutually  independent.  It  is  occasionally  suggested 
that,  if  observations  at  successive  parameter  points  were  averaged 
in  some  way,  then  convergence  would  be  improved. 


IT 
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This  question  will  now  be  investigated.  A  more  general  form  for 
q(*)  could  be  used,  but  we  stick  to  the  simple  case  in  order  to 
illustrate  the  main  point  as  efficiently  as  possible. 

Instead  of  O0.1),  we  deal  with  O0.2),  where  the  successive 
observations  are  geometrically  weighted  for  use  in  the  iteration. 


I 
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I 

[ 

C 

[ 

[ 

i 

i 

t 


(10.2) 


n+1 


n 


<■ 


0  <  a  <  1,  6  >  0, 


n 


a^n-l  +  bounded,  identically 

distributed. 


If  a  «  0,  then  (10.2)  reduces  to  (10.1).  For  a  >  0,  there  is  some 


averaging.  Define  Xe(-)  as  in  Section  1.  The  conditions  of 


Theorem  4  all  hold  and  by  Theorem  4,  { X e ( * ) }  is  tight  and  converges 


j* 

(in  D  [0,»),  for  some  integer  r)  weakly  to  the  solution  of 
x  -  -(6K/(l-o))x. 


us  center  (Xn)  about  its  "mean  value"  and  examine 


the  rate  of  convergence.  Define  Xe,£e,Xe,£e,  by 

n  n  n  n 


A  E 

X 

n+1 


AC  *£ 

Xn  - 

n  n 


x 

An » 


n 


pe 


n-1  +  0KXn*  50  *  50» 


n+1 


^  ^  C 

X  -  , 

n  n  ’ 


xn  -  0. 


’n 


£  ,vf  •*»  c 

“Cn-1  *  «*VV-  «0 


0. 


f.e 


then  xn  •  Xn  +  Xn.  Let  Un  -  X^//e.  Then 


U. 


(10.3) 


n+1 
e 


c:  - 


u„  -  *  C  «0  - 

<.l*6(K/e  «X). 


Now  define  (5n(u)}  by 


(10.4)  $A(u)  ■  o5t  l(u)  +  3(K/e  u+'I'^),  £  >  n,  each  n, 


where  the  initial  conditions  are  assigned  at  n  as  they  were  for 
£ 

t^ACx),  £  >  n)  in  Theorem  4.  We  analyze  (10.3),  which  differs  from 

the  situation  in  Theorem  4  owing  to  the  /e  in  the  ££  equation  and 

the  fact  that  ESa(x)  is  not  zero  unless  u  «*  0  (the  bar  denotes 

the  associated  stationary  process).  Nevertheless,  an  almost  identical 

development  to  that  of  Therems  3  or  4  can  be  carried  out  by  following 

£  £ 

the  technique  of  Theorem  3  and  introducing  f^,f£  with  the  correct 
centering,  very  similar  to  what  was  done  in  the  discussion  concerning 
Theorem  4.  We  simply  state  the  result  in  the  scalar  case.  (u£(*)}  is 
tight  in  DIO,*)  and  converges  weakly  to  the  diffusion  U(«): 


dU 


(10.5) 


^TaJ  Udt  +  CdW,  U(0) 


oV  2u 

‘  T^1 


0, 


where  o 2  «  E'l'2.  ^  {U£,  e  >  0,  n  >  0}  is  tight,  then  so  is 
{U£(te+-)  e  >  0>  for  any  sequence  t£  -*■  •,  and  the  weak  limit  of  the 
last  sequence  is  the  stationary  solution  to  (10.5). 

The  stationary  variance  of  (10.5)  is  C2(l-a)/23K  or 

If  3  *  (1-a)  then  the  mean  rate  of  convergence  as  well  as  the  asymptotic 

normalized  variance  do  not  depend  on  «.  For  smaller  3,  the  mean 
rate  is  lower,  the  rate  of  <k~rease  of  the  covariance  of  U(.*)  slower,  but  the 

+Here,  gcCu,€)  -  5. 
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asymptotic  normalized  variance  is  smaller.  So  there  is  no  clear 
advantage  to  averaging,  although  other  schemes  may  yield  different 
results.  The  format  developed  in  this  paper  seems  to  be  quite 
suitable  for  the  analysis  of  such  problems. 
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